score vector
Inversion-Free Natural Gradient Descent on Riemannian Manifolds
Draca, Dario, Matsubara, Takuo, Tran, Minh-Ngoc
The natural gradient method is widely used in statistical optimization, but its standard formulation assumes a Euclidean parameter space. This paper proposes an inversion-free stochastic natural gradient method for probability distributions whose parameters lie on a Riemannian manifold. The manifold setting offers several advantages: one can implicitly enforce parameter constraints such as positive definiteness and orthogonality, ensure parameters are identifiable, or guarantee regularity properties of the objective like geodesic convexity. Building on an intrinsic formulation of the Fisher information matrix (FIM) on a manifold, our method maintains an online approximation of the inverse FIM, which is efficiently updated at quadratic cost using score vectors sampled at successive iterates. In the Riemannian setting, these score vectors belong to different tangent spaces and must be combined using transport operations. We prove almost-sure convergence rates of $O(\log{s}/s^α)$ for the squared distance to the minimizer when the step size exponent $α>2/3$. We also establish almost-sure rates for the approximate FIM, which now accumulates transport-based errors. A limited-memory variant of the algorithm with sub-quadratic storage complexity is proposed. Finally, we demonstrate the effectiveness of our method relative to its Euclidean counterparts on variational Bayes with Gaussian approximations and normalizing flows.
- Europe > Belarus > Minsk Region > Minsk (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Argentina (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Rankmax: An Adaptive Projection Alternative to the Softmax Function
Several machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature. The aforementioned parameter has been observed to affect the quality of these models and is typically either treated as a constant or decayed over time. In this work, we propose a method that adapts this parameter to individual training examples. The resulting method exhibits desirable properties, such as sparsity of its support and numerically efficient implementation, and we find that it significantly outperforms competing non-adaptive projection methods. In our analysis, we also derive the general solution of (Bregman) projections onto the (n, k)-simplex, a result which may be of independent interest.
A Variational Manifold Embedding Framework for Nonlinear Dimensionality Reduction
Vastola, John J., Gershman, Samuel J., Rajan, Kanaka
Dimensionality reduction algorithms like principal component analysis (PCA) are workhorses of machine learning and neuroscience, but each has well-known limitations. Variants of PCA are simple and interpretable, but not flexible enough to capture nonlinear data manifold structure. More flexible approaches have other problems: autoencoders are generally difficult to interpret, and graph-embedding-based methods can produce pathological distortions in manifold geometry. Motivated by these shortcomings, we propose a variational framework that casts dimensionality reduction algorithms as solutions to an optimal manifold embedding problem. By construction, this framework permits nonlinear embeddings, allowing its solutions to be more flexible than PCA. Moreover, the variational nature of the framework has useful consequences for interpretability: each solution satisfies a set of partial differential equations, and can be shown to reflect symmetries of the embedding objective. We discuss these features in detail and show that solutions can be analytically characterized in some cases. Interestingly, one special case exactly recovers PCA.
- North America > United States > Michigan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
A Probability Mass Function of M
Before proving Theorem 1, we will argue regularity. We will establish the three conditions: symmetry, shift-invariance, and monotonicity. Pr[M( q) = r] = Pr[M(Π q) = π(r)], which implies M is symmetric as desired. We first prove two lemmas. The second lemma gives a useful fact about partial sums of a non-decreasing sequence.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > Canada (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
InqEduAgent: Adaptive AI Learning Partners with Gaussian Process Augmentation
Yang, Wen-Xi, Zhao, Tian-Fang, Liu, Guan, Yang, Liang, Liu, Zi-Tao, Chen, Wei-Neng
However, most study partners are selected either rely on experience-based assignments with little scientific planning or build on rule-based machine assistants, encountering difficulties in knowledge expansion and inadequate flexibility. This paper proposes an LLM-empowered agent model for simulating and selecting learning partners tailored to inquiry-oriented learning, named InqEduAgent. Generative agents are designed to capture cognitive and evaluative features of learners in real-world scenarios. Then, an adaptive matching algorithm with Gaussian process augmentation is formulated to identify patterns within prior knowledge. Optimal learning-partner matches are provided for learners facing different exercises. The experimental results show the optimal performance of InqEduAgent in most knowledge-learning scenarios and LLM environment with different levels of capabilities. This study promotes the intelligent allocation of human-based learning partners and the formulation of AI-based learning partners.
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Education > Educational Setting > Online (0.93)
- Education > Educational Technology > Educational Software > Computer Based Training (0.69)
- Health & Medicine (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
A Recurrent Neural Network based Clustering Method for Binary Data Sets in Education
Ohira, Mizuki, Saito, Toshimichi
This paper studies an application of a recurrent neural network to clustering method for the S-P chart: a binary data set used widely in education. As the number of students increases, the S-P chart becomes hard to handle. In order to classify the large chart into smaller charts, we present a simple clustering method based on the network dynamics. In the method, the network has multiple fixed points and basins of attraction give clusters corresponding to small S-P charts. In order to evaluate the clustering performance, we present an important feature quantity: average caution index that characterizes singularity of students answer oatterns. Performing fundamental experiments, effectiveness of the method is confirmed.
Checklist 1. For all authors (a)
Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] Link provided Did you specify all the training details (e.g., data subsets, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [No] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) All experiments were run on a GeForce RTX 2080 Ti GPU. The model used in the toy experiment follows the one used in Belinkov et al. (2019b). These representations are concatenated and passed to a one-hidden-layer feed-forward network for binary classification.